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Abstract 

We study the best-choice problem for processes which generalise the process of 
records from Poisson-paced i.i.d. observations. Under the assumption that the 
observer knows distribution of the process and the horizon, we determine the optimal 
stopping policy and for a parametric family of problems also derive an explicit 
formula for the maximum probability of recognising the last record. 

1 Introduction 

Maximising the probability of stopping at the extreme of a sequence of random marks 
is the classical objective in sequential decision problems widely known as the best-choice 
or 'secretary' problems [31E3I22]- Problems of this kind can be formulated in terms of 
the embedded process of records, because the overall extreme (e.g. minimum) is the last 
record observation. 

In a basic version of the problem introduced by Gilbert and Mosteller [TIH Section 3] 
the marks are sampled at discrete times from the uniform distribution, and the objec- 
tive of the observer is to stop at the minimum among the first n marks. The sequence 
of values of sequential minima, called lower records, undergoes a stick breaking process 
Xi, XiX 2 , XiX 2 X 3 , . . ., where the Xj's are independent copies of a prototypical random 
factor X whose distribution is uniform. Given the record values, the durations of records 
are independent, and for r a generic record value, the duration of a record with this value 
has geometric distribution with parameter r. See E3 Elj for these basic facts of the 
theory of random records. The optimal policy in the stopping problem of JU] is rather 
complicated, as it involves a sequence of thresholds for which no closed-form expression is 
available, and for the same reason there is no explicit formula for the optimal probability. 

According to another version of the problem, the marks are observed at epochs of a 
unit Poisson process, and the goal is to stop at the minimum mark before given horizon 
T (see ^2] and references therein). This problem allows much more explicit results: the 
optimal policy prescribes stopping at the first time the record process breaks through a 
hyperbolic boundary, and there is an explicit formula for the optimal probability. The 
continuous time problem corresponds to the model sometimes called Poisson-paced records 

*Insitute of Mathematics, Utrecht University, The Netherlands, gnedin@math.uu.nl 



1 



[Tol Section 9] , the difference with the discrete-time model is that the duration of a record 
with value r has exponential distribution with rate r. For large n and T the discrete 
and continuous time versions are close to the same limiting form introduced in jTTJ, 
in particular the limiting optimal probability is given by the formula first obtained by 
Samuels |2T] in discrete setting. These and related results are reviewed in Section 17.31 

In this paper we consider the continuous-time problem of recognising the last record 
under a more general assumption that the occurences of records follow a stick-breaking 
scheme, with factor X having an arbitrary distribution on the unit interval. Models of this 
kind appear in many contexts such as branching processes, search problems, sequential 
packing problems and random partitions 13 El ED • Although we just postulate the 
behaviour of records without any reference to some more rich observable process, the 
model in focus is related to one concept of sequential extreme for sampling from certain 
partially ordered spaces, including spaces M d with continuous product distributions. This 
connection is detailed in Section 

We will show that the optimal policy is always of the same form as in the case of 
uniform X. In one special case of parametric family of beta distributions we express the 
optimal probability in terms of the incomplete gamma function. In general, however, it 
does not seem possible to write a closed-form expression for the stopping value. Still, 
we argue that under minor side conditions on the law of X, as T — > oo, there exists a 
limiting value which may be interpreted as the optimal probability of recognising the last 
record in a stopping problem with infinitely many observations. The famous best-choice 
probability benchmark e _1 will show up as a sharp lower bound. 

2 The model 

We shall model the occurences of records by means of a nonincreasing right-continuous 
Markov process R = (Rt, t > 0) with the following type of behaviour: given the current 
state is r > 0, the process jumps at rate r to a new state rX, where X is a prototypical 
random factor with a given distribution in the open interval ]0, 1[. In the event t is a 
jump instant of R we say that a record occurs at time t and intepret R t as the weight of 
the record. The weights of consequitive records decrease, while the sojourns of R, which 
include the first record time and further durations of records, are stochastically increasing. 

In more detail, the weights of records undergo stick-breaking roX\, roXiX2, . . . , where 
X/s are independent replicas of X and = Rq is the initial state of R. The sequence of 
sojourns may be represented as 

E 1 /r ,E 2 /{r X l ),E 3 /{r X l X 2 ),... 

where Ej's are i.i.d. unit exponential variables, independent of the Xj's. Thus, condi- 
tionally given the weights of records, the sojourns are independent exponential variables. 

We are interested in the problem of maximising the probability of recognising the last 
record of R before a given horizon T, by means of a nonanticipating policy (stopping 
time) adapted to the natural right-continuous filtration of R. For 7r such a policy the 
efficiency is measured by the probability that 7r is a record time not exceeding T and that 
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no further record occurs before time T: 

F(R W _ > R 7T = R T ,n <T) = E [exp { — (T - n)R w } 1(R^_ > R^, ir < T)) , (1) 

where the second expression involves the adapted probability of recognising the last record 
when the stopping occurs. 

In the terminology going back to Gilbert and Mosteller jTUj, this stopping problem 
should be qualified as a problem with 'full information', meaning that the observer learns 
the weights of records and knows their distribution exactly. Under 'no-information' prob- 
lem we understand the optimal stopping problem where only policies based on record 
times are allowed. 

3 Chain records 

Sampling from arbitrary continuous distribution F leads to the stick-breaking process for 
records with uniform X. This is seen by defining the weight via v i— > F(v ) and by noting 
that this mapping preserves the ranking and transforms a sample from F into a sequence 
of uniform variables. In this section we discuss some extensions of this framework. 

Sampling from certain discrete distributions also leads to stick-breaking process for 
records. Define a distribution by allocating the geometric masses pq k ~ 1 (where p + q = 1 

and < p < 1) at points of some decreasing sequence Zk, k = 1,2, Consider strict 

lower records in a sample from such distribution. Define the weights by means of the 
/e/i-continuous distribution function v i— > F(v—). If the first sample value is Zk, then 
the next observation is a record with probability q k ; from this we see that the weights of 
records follow the stick-breaking scheme with factor 

oo 

X = d ^pq k -\H, 

k=l 

where 5 X is the Dirac mass at x and =d denotes the equality in distribution. 

Sampling from other distributions on reals is not consistent with the stick-breaking 
model for records. We will look now in higher dimensions. 

Consider M. d endowed with some continuous product distribution \i and the natural 
strict partial order -<. For a sample Vi, V2, ■ ■ ■ from (IR d ,/z), we say that a chain record 
occurs at index j if either j — 1 or j > 1 and Vj is -<-smaller than the last chain record in 
the sequence V±, . . . , Vj-\- Define the weight of a chain record by means of the multivariate 
distribution function v ^ fi{u G lR d : u -< v }. The weights of chain records follow a stick- 
breaking process with the density ¥(X G dx)/dx = \ \ogx\ d ~ l / (d — 1)! for the factor X. 
Indeed, the componentwise probability transform establishes isomorphism between the 
ordered probability space -<) and the unit cube [0, l] d with the Lebesgue measure, 

which implies that the law of X is the same as the distribution of the product of d 
independent uniform variables, whence the formula for the density. 

Chain records in M. d were introduced in |T3]. Unlike other kinds of multidimensional 
records surveyed in ^7j, the chain records cannot be regarded as 'generalised minima', 
because permutations of V\, . . . , Vj-i may destroy or create a chain record at index j. 
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The sequence of chain-record marks is a 'greedy' decreasing chain in the partially ordered 
sequence of marks, in the sense that element Vj is joined to the chain each time when the 
monotonicity constraint is not violated (as to be compared, e.g., with the longest chain 
among the first n marks). 

The definition of chain record extends in an obvious way to sampling from an arbitrary 
Borel space Z endowed with a probability measure /i and a measurable strict partial order 
-<;. The weights are defined by means of the function v i— ► fi(L v ), where L v = {u £ Z : 
u -< v} is the lower section of -< at v £ Z. Call the space (Z, fi, -<) lower-homogeneous if 
(i) n{L v ) > for /i-almost all points v £ Z, and (ii) the lower section L v with conditional 
measure /jl(-) / /j,(L v ) is isomorphic, as a partially ordered probability space, to the whole 
space (Z,fi,^). Since all L„'s are in this sense the same, the weights of chain records 
in a sample from a lower-homogeneous space undergo a stick-breaking with the factor 
X =d n{u £ Z : u -< V} where V has distribution \i. 

It is easily seen that [0, l] d with uniform distribution is a lower- homogeneous space, 
hence this is true also for M. d with continuous product distribution. Another example is the 
interval space which has intervals }a, b[c [0, 1] as elements, the partial ordering -< defined 
by inclusion, and a measure //(dadfo) = a(a — 1)(6 — a) a dadb (with parameter a > 1); 
in this case P(JT £ dx)/dx = (a — \)(x~ l l a — 1). Although both examples are instances 
of Bollobas-Brightwell box-spaces jH] (which have all intervals {u : v -< u -< w} for v -< w 
isomorphic to the whole space and not only L„'s), there are many other lower-homogeneous 
spaces that are not box-spaces. By the transitivity of partial order, the distribution of X 
appearing in this way must satisfy the inequality ¥(X < x) > x , a; £ [0, 1]. 

4 Stopping the embedded Markov chain 

A fundamental property of the process R is self-similarity: for each r > 0, the law of 
(Rt, t > 0) with R = r is identical to the law of (rR rt , t > 0) given R Q = 1. This 
implies that, when the law of X is fixed, the 'size' of the problem is determined by a 
single parameter r^T. 

Self-similarity is a clue to derive the optimal policy. If stopping has not occur before 
and including time t and if the current state is R t = r, then the conditional optimal 
stopping problem is equivalent to the unconditional problem with initial state 1 and 
horizon (T — t)r. This motivates associating with R (with fixed parameters ro, T and the 
law for X) another decreasing Markov process B = (B t , t > 0), 

B t — (T — t)+Rt , t > 0, 

with the initial state Bq = r^T and the absorbing terminal state 0. Obviously, it is 
sufficient to consider the policies adapted to B, with understanding that the last record 
before T corresponds to the last jump of B before absorption at 0. The sequence of 
locations visited by B at the record times is a discrete-time homogeneous Markov chain 
which follows the transition scheme s i— > (s — E) + X, s > 0, where E is a rate-1 exponential 
variable independent of X. 

In terms of the embedded chain the optimal policy is determined in a standard way, 
by comparing two kinds of risk. If the current state of B is s, the probability that no 
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further records occur is po{s) = e s . On the other hand, the probability that exactly one 
record will occur is 



Pi(s)= / e- t E[p ((s-t)X)]dt = e- s E 
Jo 



_ i 
1-X 



Inspecting two extremes s = and oo and exploiting monotonicity, we see that the 
equation 

- e s(i-x)_ 1 - 



E 



1-X 

has a unique positive solution s*. Because 



(2) 



Po{s) < Pi{s) s> s 



* ) 



and because B has decreasing paths (until getting absorbed) we are in the familiar mono- 
tone case of optimal stopping, hence the optimal policy stops at the first jump of B within 
the region [0, s*]. Translating this back in terms of R we see that it is optimal to stop 
at the first record time when the condition (T — t)R t < is satisfied. In particular, if 
Tr Q < s* it is optimal to stop at the very first record. 

More generally, for s > we denote ir s the policy which prescribes stopping at the first 
record time when (T — t)R t < s holds. Summarising the above discussion we conclude: 

Proposition 1. The policy 7r St with satisfying (J2J) is optimal. 

Assuming R = 1, let v(T,s) be the value of the policy 7r s , i.e. the probability that 
exactly one record before T satisfies (T — t)r < s. (By self-similarity the case of arbitrary 
Ro = ?"o can be reduced to that.) Obviously, 

v(T,s) =Pi(T), for T < s . (3) 

The first-record decomposition readily yields an integral equation 



v(T,s) 



[ T E [v(X(T -t),s) 1((T - t)X > s) + e -( T - t)x 1((T - t)X < s)] e^dt . 
Jo 



which for s = s* is the familiar dynamic programming equation for the optimal value. In 
the differential form this becomes 

drv(T, s) = -v(T, s)+E [v(TX, s) 1(TX > s)] + E [e~ TX 1(TX < s)] . (4) 

The equation (J3J) is of delayed type, which only in exceptional cases admits a closed- 
form solution. For instance, when the distribution of X is 8 X , the solution is a piecewise- 
analytical function which should be computed recursively in the intervals T e [s/x k ~ 1 , s/x k ] 
for k = 1,2, . . ., starting from [0, s/x] where v(T, s) = pi(s) holds. 

The collection of sites which B visits at record times is not a Poisson process, since 
otherwise v(T, s) were constant in T for T > s. It is therefore surprising that the maximum 
of v(T, s) in s is attained at the same point s#, for all T > s*. 
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5 The lower bound 



Suppose for a while that the law of X is 61. In this case (jlj) is easily solved as v (T, s) = 
(T A s)e~( TAs \ Thus = 1 and for T > 1 the optimal probability is u(T, s#) = e" 1 , 
which also coincides with the maximum of pi(s) = se~ s . To bring this conclusion into 
the familiar 'no- information' framework note that R t = 1, hence there is no updating of 
record weights. For the same reason, the record times are the epochs of a unit Poisson 
process, hence the stopping problem amounts to recognising the last Poisson epoch on 
[0,T], which is the 'no-information' problem for Poisson process due to Browne [7j. A 
characteristic feature of this case is that v(T,s) is constant in T for T > s (see the last 
remark in Section 0J) . 

We show next that the familiar benchmark e" 1 = 0.367. . . yields a universal lower 
bound in our model. 

Proposition 2. For every distribution of X the optimal probability satisfies v(T, s*) > 
e~ s * forT > s* . Above that, s* < 1 hence 

v{T,s*) > e- 1 for T > 1, 

and this bound is sharp. 

Proof. Suppose r T > s*. The process B can enter [s*,0] by either continuously drifting 
down or jumping down through s*. In the first case the conditional probability of sucess 
with 7r* is pi(s*) = Po(s*) = e~ s * . In the second case this probability is E[e -5 ] > e~ s *, 
with some random S < s*, because Tc St will stop. The estimate readily follows. 

Applying the inequality e 1 ~ x > 1 + (1 — x) for < x < 1, we see that the left-hand 
side of (J2J) is larger than 1 for s = 1, therefore the root satisfies s* < 1. The bound e~ l is 
approached by letting the law of X to approach S±. □ 

The same argument yields a more general inequality v(T, s) > min(po(s), Pi( s )) f° r T > s, 
where the right side assumes the largest value at s = s*. 

For X uniform s* = 0.804. . . and the lower bound is e~ s * = 0.447. . ., while for X 
with density | logx| these are 0.743 . . . and 0.475 . . .. 

6 Entrance from the infinity 

For asymptotic considerations we shall vary the initial state and denote P r the law of R 
with Rq = r. Assume that E | log X\ < oo and that the distribution of X is not supported 
by a geometric progression (note that these are precisely the conditions for applicability 
of the renewal theorem [H] to — log A). Let 

/(A) = E[A A ] 

be the Mellin transform of X. Clearly, — /'(0) = E | log X\. Adapting Theorem 1] we 
have: 
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Proposition 3. Under the above assumptions, as r — ► oo ; the lawF r has a weak limit 
Poo characterised by Rt =dY/t , t > 0, where Y is a random variable uniquely determined 
by its moments 

Corollary 4. Under these circumstances there exists a limit v(oo, s*) = lim^oo u(T, s*) 
which is the maximum probability of recognising the last record for the process (Rt, t < 1) 
under P^. 

Proof. This follows from the form of the optimal policy and the fact that the point process 
of sites visited by B at record times has a weak limit as -Bo — ► oo. □ 

The law of K determined by (0) can be considered as a kind of extreme- value distribution. 
For instance, Y is exponential for X uniform, while Y is distributed like the product of 
independent uniform and exponential variables for X with density | logarj. 

Denoting ti,t 2 ,wi,W2 the times and weights of the last record and the record before 
the last, the performance of 7r s in the infinite problem can be written as 

v(oo,s) = Poo((l-Ti)pi < s < (l-r 2 )p 2 ) =P 00 ((l-r 1 )p 1 < s) -P oc ((l -r 2 )p 2 < s). (6) 

In principle, the moments (0) determine the distribution of these variables, for instance 

P oc (r 1 <t)=E [e Y ^] , 

but it seems impossible to use this for writing v(oo, s) in some explicit form. 



7 The beta case 

We proceed with more concrete computations under the assumption that the distribution 
of X is beta(#, 1), with the density 

¥(X G dx)/dx = 6x 6 ~ l , xe[0,l], 

where 9 is a positive parameter. The instance 6 = 1 corresponds to the uniform distribu- 
tion. This class of stick-breaking processes has a feature that under P^ both the range 
of R and the point process of record times are Poisson point processes with intensity 
measure 9dz/z, z > 0. The law of Ri under is a gamma distribution. 
The integral 

Pl ( s ) = / ex e ~ l dx 

Jo i-x 

does not simplify, hence it should be included in the final formula for the optimal proba- 
bility as it is. 
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7.1 Computing the value 

For T > s a substitution translates into 

T e d T v(T,s) = -T e v(T,s) + v{t,s)9t e ~ 1 dt + e^Ot^dt . 

Differentiating in T and simplifying we are lead to 

Tg" + (T + 6)g' = (7) 

for g(T) = v(T, s). Solving this and taking into account the boundary condition at T = s 
yields 

v(T,s) = T(-9 + l,s,T)e s s e P [(s)+p 1 (s), for T > s , (8) 
where c 

r(a,b,c) = I e-H^dt 



denotes the incomplete gamma function, and 

p' 1 (s) = - Pl (s) + ^T(6,0,s). 

For the optimal s* using pi(s*) = e~ s * we obtain from (JEJ) 

v(T, s») = r(-0 + 1, s*, T)[-s* + e s ^r(^, 0, s»)] + e" s * , (9) 

which is the optimal probability of stopping at the last record. The formula is valid for 
T > s*. The optimal probability u(oo,s*) in the limit problem is just obtained taking 
T = oo in the integral in ©, which reads as a generalised exponential integral function 

r(-0 + l,s*,oo) = / -rrdt. 



The following table shows some numerical values of this probability computed with a 
help of Mathematica. 



9 0.1 0.25 0.5 1 2 5 20 
s* 0.709 0.731 0.760 0.804 0.857 0.922 0.976 
u(oo,<) 0.913 0.814 0.703 0.580 0.481 0.410 0.377 

The data suggest to examine the extreme values of the parameter 9. 

As 6 — >• oo the beta distribution approaches Si, hence s* — > 1 and u(oo,s*) — * e~ l . 
Thus the beta family may be interpreted as a bridge between the 'full-information' prob- 
lem (8 = 1) and the 'no-information' problem (0 = oo). 

Note that, for arbitrary 9 > 0, in consequence of the Poisson character of the record 
times under P^, the time-threshold policy tc = min{t > T/e, Rt > Rt-} yields the limit 
probability of success equal e -1 for T — > oo. 

As 9 — > the beta distribution approaches 5q. In this regime the optimal ap- 
proaches log 2. Selecting T sufficiently large to secure occurence of at least one record 
with probability at least 1 — e, and then sending 9 to 0, we will have v(Tq, s*) > 1 — e, 
because with high probability exactly one record occurs before horizon T. For T > Tq 
(jHJ) implies v(T,s*) > v(Tq,s*), therefore the trivial upper bound v (oo, s#) < 1 is sharp 
as the law of X varies. 
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7.2 A smooth fit 



With the explicit formula (JBJ) in hand we can alternatively characterise s* as the maximiser 
of v(T, s) in s. Equating d s v(T, s) to we see then that s* is a root of the equation 

S P 'l(s) + (8 + e) P , 1 (3) = 0, (10) 

which is, in fact, equivalent to po(s) = pi{s) due to the identity 

sp'Ks) + (s + e)p'(s) = 6(p (s) - Pl (s)). 

Comparing (fTUj) with shows that two branches of v(T, s*), for T < s* and T > s*, 
match at T = s* together with two derivatives. This degree of smoothness is characteristic 
for s = s», as is also seen by the following argument. Write the probability of success 
with policy tt s as 

v(T, s ) = f o P'i(t)dt + J d T v(t, s)dt , 

and note that the optimisation of s amounts to finding a 'switch' which maximises the 
sum of integrals. Inspecting the monotonicity properties of the integrands 

p' 1 {t) = e —{p' 1 {t)t d e t ) and d T v(t, s) = ^V x (s) S V) 

shows that the maximum is achieved if they are tangential at the switching location, 
which is precisely the condition (jl()j) . Thus is indeed the only value of s such that 
8tv(T, s) has no break at T — s. 



7.3 The uniform case 

For completeness we bring together known formulas for the case of uniform factor X. 
The solution to 

r e* - 1 , , 

/ dt = 1 

Jo t 

has the approximate value s* = 0.804 . . . The limit probability 

v(oo, s») = (e s * - s* - 1) / — dt + e~ s * = 0.580 . . . , 

J ' 

was obtained first numerically in ^U] by interpolation from discrete-time problems, derived 
in [21] from (JBj) . and shown |3j by some series computations with the Poisson process. The 
analogous formula for v(s,T) with finite T appeared in |T2] . 

The process of records under corresponds to the set of -<-minimal atoms of a unit- 
rate Poisson point process on IR^ (recall that an atom is ^-minimal if there are no other 
Poisson atoms south-west of it). The unique properties of the planar Poisson process 
allow more delicate computations. Under the density of tt s is jTJ] 

P(tt s G dt)/dt = ^(e~* s - e- ts / (1 -*)) + sT (o, st, ^) + 1 - e~ st , t G [0, 1], 
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This integrates to some number less than 1 because with positive probability ir s does not 
stop at all (this probability is aproximately 0.1995. . . for the optimal policy 7r s J. The 
optimal probability can be also represented as the integral 

v(oo, s#) = / w(t)dt , 
Jo 

with w(t) the winning rate, equal to the chance that TT St stops correctly in time dt. The 
graph of w was sketched long ago [TUt Figure 3], and the following explicit formula for the 
winning rate is a recent result [TB] : 

e ~ s *t _ e — s*t/(i-t) e~ s »* — te~ s * 

w(t) = _ e - + _ + _ + 

i^K"-.T^)- r («.-'.^. 

(the boundary values w(0) = 1 — e~ s *, w(l) = e~ s * were indicated in [TQJ ) . 

8 Concluding remarks 

A discrete-time version of the problem with fixed horizon n is associated with a process 
analogous to R but with geometric durations of records. The optimal policy is known 
only for uniform X. Moreover, it is not clear if the monotone case of optimal stopping 
applies for the general distribution of X. Using techniques from [IBj one can show that 
under the assumptions of Section |U] the discrete-time problem can be aproximated, for n 
large, by the limiting problem with continuous time, hence the policy 'stop at index j if 
a record occurs with weight r satisfying r(n — j) < s#' is asymptotically optimal. 

It would be also interesting to evaluate suboptimal policies like 'stop at the first record 
with weight below given u>' or 'stop at the first record that occurs after a given time to'- 
This is not so easy in general since such policies are not adapted to B. 
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